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Q ' Abstract 

In this review some simple models of asexual populations evolving on smooth landscapes are 

studied. The basic model is based on a cellular automaton, which is analyzed here in the spatial 

^ . mean-field limit. Firstly, the evolution on a fixed fitness landscape is considered. The correspondence 

^^ ' between the time evolution of the population and equilibrium properties of a statistical mechanics 

^^3 , system is investigated, finding the limits for which this mapping holds. The mutational meltdown, 

Eigen's error threshold and MuUer's ratchet phenomena are studied in the framework of a simplified 

>P ' model. Finally, the shape of a quasi-species and the condition of coexistence of multiple species in 

a static fitness landscape are analyzed. In the second part, these results are applied to the study of 

the coexistence of quasi-species in the presence of competition, obtaining the conditions for a robust 

speciation effect in asexual populations. 
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1 Introduction 



r^ ' The object of our study is the self organization of an ensemble of interacting individuals: we would like 

Q , to develop a model of a co-evolving ecosystem. In this paper only the very few steps of this project are 

O ' presented; we study the emerging features of asexual populations in very simple environments. 

This paper complements (and sometimes overlaps) the review by Baake and Gabriel Q . Many of the 
mathematical topics we deal with (say, error threshold, mutational meltdown, MuUer's ratchet) find there 
a correct biological perspective. We also suggest Baake and Gabriel's review as a source of commented 



C^ ' references. 



Several models exist concerning the properties of multi-species populations (see for instance Refs. [^ 
0]), however we would like to start at a more elementary level, based on individual dynamics. 

Our basic building block is a very sinrple schematization of an individual: just a string of symbols (the 
genotype) corresponding to a few phenotypic traits, which are functions of the genotype. In this way we 
model haploid individuals both without sexual reproduction and without polymorphism. In this paper 
we limit ourselves to the study of spatially homogeneous systems (spatial mean field) , which means that 
we are simulating the evolution in a well stirred container. 
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In a very fundamental model, each individual would feed on somebody else, including the external 
source of energy. A similar approach is presented in the web- world model 0, which is however based 
on the species concept, rather than on individual dynamics. In many cases, one is interested in only a 
part of an ecosystem. Very few studies (if any) take into consideration all interactions among organisms, 
say from mammals and plants to the microscopic world. One is tempted to assume that an ecosystem 
(which should in principle correspond to the whole earth) can be broken into subsystems each of which 
is weakly coupled to the rest of the world and has different characteristic times. For instance, one 
could focus on animals, considering the vegetables as a static substrate. Or, alternatively, study the 
vegetables considering the animals as self-averaging fluctuations. This simplification is valid only for a 
limited amount of time, which may be long on the individual time-scale. Eventually, a rearrangement 
(say, climate change, asteroid impact, human pollution) will heavily change the system. 

This focusing on a part of the whole system can be modeled by the concept of the fitness landscape. 
This landscape is customary defined as the average number of offsprings which reach the reproduction 
age, and in general it is originated by complex interactions with many other individuals. Since the 
outside world is considered fixed, also the interactions with individuals belonging to it do not change 
with time. Thus the reproduction rate can be divided into a static term (the static fitness landscape) 
and an interaction term, which depend on the presence of individuals in the subsystem under study. 

The internal interactions can be classified into predation (parasitism), competition or cooperation, 
depending on how the presence of an individual affects the reproduction rate of another one. In a 
fundamental description, only predation or parasitism should be present; however, the average over the 
outside world can induce the presence of competition or cooperation between two individuals in the same 
subsystem. 

The evolution on a static fitness landscape (i.e. without internal interactions among individuals) is a 
very interesting subject by itself, and will be studied in this article limited to very smooth landscapes. 
This subject is directly related to the properties of genealogical trees. Since the landscape is fixed, the 
evolution of a population can be obtained as a sum over all possible stories of individuals, in a way similar 
to using a path integral approach in equilibrium statistical mechanics. This equivalence is investigated 



further in Section 3.1. We find that on a flat enough landscape and for large enough mutation rates the 
asymptotic distribution is indeed a Boltzmann one, in which the fltness plays the role of the energy and 
mutations that of a temperature. 

One of the most striking feature of evolution is the breakdown of an uniform distribution (both in 
the genotypic and in the phenotypic space) and the formation of the clusters of individuals that are 
eventually denoted as "species". The deflnition of a species is not a trivial task. One deflnition (a) 
is based simply on the phenotypic differentiation among the clusters, and can be easily applied also to 
asexual organisms, such as bacteria. Another deflnition (b), that applies only to sexual organism, is based 
on the inter-breeding possibility. Finally (c), one can deflne the species on the basis of the genotypic 
distance of individuals, taking into consideration their genealogical story [Q. 

Many properties of genealogical trees on flat landscapes have been studied by Derrida, Higgs and 
Peliti [g], H, 01, both for asexual and sexual reproduction. Their conclusion is that species (deflned 
using the reproductive isolation, deflnition (b)) can appear in flat static landscapes provided with sexual 
reproduction and discrimination of mating. In some sense these authors have identified definitions (b) 
and (c). 

Also the reason for the existence of species is quite controversial Q. We are here referring to the 
formation of species in a spatially homogeneous environment, i.e. to sympatric speciation. In this frame 
of reference, a niche is a phenotypic realization of relatively high fitness. Species have obviously to do 
with niches, but one cannot assume that the coexistence of species simply reflects the presence of "pre- 
existing" niches; on the contrary, what appears as a niche to a given individual is co-determined by the 
presence of other individuals (of the same or of different species). In other words, niches are the product 
of co-evolution. 

Most authors consider sexual discrimination as the fundamental ingredient for the formation of species 



in flat (or almost flat) fitness landscapes (no niches). In this paper we deal only with asexual populations, 
so we stick to the phenetic definition (a). Instead of the "species" term (which has a very precise 
meaning for biologists P]) we adopt the weaker term "quasi-species" , to indicate a group of phcnotypically 
similar individuals, separated (again, in the phenotypic space) by other groups. We further use the term 
"speciation" to denote the process of division of a quasi-species in two or more groups, or, considering a 
quasi-static situation, the coexistence of several quasi-species. 

The speciation phenomenon can occur on static landscapes with severaH almost degenerate) maxima. 



which correspond to the niche concept. However, as discussed in Section ^^, in this case the coexistence 
depends sensitively on the mutation rate, or is just a transient phenomenon. On very rough landscapes, 
this transient behavior can extend to very long times, in a way similar to the spin glass behavior |1^, ll|. 

The roughness of the landscape depends also on the observation scale, and is minimal on the individual 
level and much larger on the species level. Since we are interested in the individual dynamics, we assume 
quite smooth landscapes. 

One of the first studies of evolution an a static landscape was performed by Eigcn and others [IJ, |l^ . 
They investigated the role of mutation in destroying a quasi-species located around the fittest strain. This 
effect is called the error threshold transition, and takes the form of a real phase transition (first order) in 
the limit of infinite genome length and infinite population Mm . The Error threshold for finite populations 
has been studied in Refs. [|5| |l^, [l^. This phase transition is not present for smooth landscapes (for 
an example of a study of evolution on a smooth landscape, see Ref. |19)). The limit of long genomes 
is interesting, since in this case one can neglect to consider back mutations |18[. In this case another 
mechanism is present, the so called Muller's ratchet [g^, |2l] or stochastic escapep^, p3| , which, for finite 
populations, causes the loosing of fitter strains by stochastic fluctuations. Since it relies on a random 
process, the average escape time is of the order of the exponential of the size of the strain pj|, and thus 
this effect is relevant only for small populations. 

On the other hand, this effect can lower the average fitness of the population below a viable threshold, 
after which extinction can occur pq | while the distribution still retains the quasi-species bell shape. Both 
the error threshold and the Muller's ratchet phenomenon will be investigated in Section pT3. 

In Section 0, we include the effects of the interactions among individuals belonging to the subsystem 
under study. The most fundamental terms are two-body interactions. In this paper we are mainly 
interested in the speciation phenomenon, i.e. to the differentiation of otherwise similar individuals. Intra- 
specific cooperation, i.e. the increase of reproduction rate due to the presence of a genetically related 
individual is presumably due to familiar or group structure, and will not be considered here. 

We assume that the more similar two individuals are, the greater is the competition term that lowers 
their fitness. One can think that this competition term arises from averaging over shared resources 
belonging to the outside world. 

One of the most interesting feature that originates from the inclusion of an intra-specific competition 
term is phenotypic differentiation. Let us assume that the static fitness landscape has a single smooth 
maximum. Without competition, the distribution of an asexually reproducing population would follow a 
bell shaped curve, peaked around the maximum and whose width is given by mutation and also by the 
curvature of the maximum (for small mutation rates). If the competition term is large enough, this single 
peak curve will split into several "quasi-species" , separated by the range of the inter-specific competition, 



as shown in Section 4.1 



The fact that one can observe phenotypic splitting of a population solely due to competition is very 
interesting. In presence of sexual reproduction (which has not been studied here) the competition lowers, 
since the phenotypic distribution is broader. This could well be a valid reason for the evolutionary 
survival of sexual reproduction in spite of its cost Eq, G%. Moreover, if the splitting due to competition 
holds even in presence of sexual reproduction (under study at present), it may be invoked as an effective 
mechanism for sympatric speciation. 

Mutations act on the genotype, while selection acts on the phenotype. A single mutation can greatly 
alter the phenotype. However, if one neglects immediately lethal mutations (they simply change the 



fitness of the genome), most of surviving mutants have very similar phenotypes with respect to the 
parent. Thus, in many cases one can get rid of the genotypic space: for large scale evolution (formation 
of species) one can assume that mutations induces a diffusion in the phenotypic space, while for small 
scale evolution (asymptotic distribution of intra-specific traits) one can assume that mutations are able 
to generate all possible phenotypic traits. 

The outline of this pap er is the following. In the next Section we discuss the ingredients needed in 
our models. In Section |2.3| we present a cellular automaton model of an evolving ecosystem whi ch is our 



microscopic reference system. The spatial mean field theory of it allows us to obtain (Section 2A) the well 
known mutation-selection equation (corresponding to a reaction-diffusion process in the genotypic spac e). 
We then develop some considerations about evolving ecosystems without competition: in Section 3T we 



examine to what extent an evolutionary system can be considered equivalent to an equilibrium statistical 



model; in Section 3.2 we present a toy model exhibiting the error threshold, the mutational meltdown 



|28| and the MuUer's ratchet effects; finally, in Section 3^ we derive the shape of a quasi-species as a 
function of the height and curvature of a maximum of the static fitness landscape. These last results 
will be useful to determine the distribution of quasi-species in presence of competition. Afterward, in 



Section 3.4 we analyze to what extent coexistence can be present on a static fitness landscape. 

In Section ^, we introduce the intra-specific competition interaction. We show that such a model 
can present phenotypic differentiation (speciation: formation of quasi-species) even in absence of sexual 
reproduction. Since the evolution in the genotypic space is difficult to analyze theoretically, we present 



in Section 4.1 a model of phenotypic evolution which still exhibits the speciation effect and that can be 



analyzed analytically. These approximate analytical results are confirmed by Monte Carlo simulations in 



the effective genotypic space, as reposted in Section 4.2. 
The conclusions are illustrated in the last Section. 

2 Preliminaries 

2.1 Genotypic space and mutations 

The fundamental ingredient of evolutionary models is the genetic information transmitted from one 
generation to the other, and the selection mechanism that models the distribution of this information. In 
what follows we consider discrete generations, since they are more directly implemented on a computer. 



The cellular automaton model that presented in Section 2.3 is our microscopic reference system. By 
varying some parameters it may present both non-overlapping and overlapping generations. 

In the simplest version, the genetic information (genotype) of an individual is represented by a binary 
string X = (xi, X2, ■ ■ ■ ,xl) of L symbols x^ = 0, 1. In this way we are modeling haploid (only one copy 
of each gene) organisms, i.e. bacteria or viruses or, more appropriately, more archaic, pre-biotic entities. 
The choice of a binary code is not fundamental but certainly makes things easier. It can be justified 
by thinking of a purine-pyrimidine coding of DNA or of good and bad alleles for genes. In this second 
version, represents a good gene and 1 a bad one. For bacteria, one can prove that major and minor 



codons work in such a way |29 . Sometimes we represent a genome using Ising-like variables Ui = 2xi — 1. 

During the individual life, errors (mutations) accumulate into the germ line (which for unicellular 
organisms coincide with the somatic line). When one individual reproduces, it duplicates its genetic 
material, and during this process other mutations can occur. These errors are transmitted to the progeny 
and cause the fact that even for asexual organisms the offspring can be different from the parent. We 
do not think that the differences between these two kinds of hereditary mutations causes qualitatively 
different behaviors, at least at the level of schematization of this work. 

While there is a great variety of possible mutations (insertions, deletions, transposition, inversions, 
etc), from a mathematical point of view it is preferable not to alter the length of the genome (otherwise we 
should introduce a three-symbols code, say 1 for good genes, -1 for bad genes and for empty positions. 



i.e. neutral genes). 

We consider two kinds of mutations: point mutation, that interchange a with a 1, and all other 
mutation that do not alter the length of the genome (transposition and inversions). 

We can define the distance between two genomes x and y as the minimal number of point mutations 
needed to pass from x to y, and this coincides with the Hamming distance d{x,y). Thus, all possible 
genomes of length L are distributed on the 2^ vertices of an hypercube. A point mutation corresponds 
to a unit displacement on that hypercube (short-range jumps). 

For simplicity we assume that all point mutations are equally likely, while in reality they depend on 
the identity of the symbol and on its positions on the genome. For real organisms, the probability of 
observing a mutation is quite small. We assume that at most one mutation is possible in one generation. 
We denote with fig the probability of having one point mutation per generation. 

The probability to have a point mutation from genotype y to genotype x is given by the short-range 
mutation matrix Ms{x,y) which is 

Us a X = y, 
Ms{x,y) = { ^ iid{x,y) = l, (1) 

otherwise. 

Other mutations correspond to long-range jumps in the genotypic space. A very rough approximation 
consists in assuming all mutations equiprobable. Let us denote with fig the probability per generation of 
this kind of mutations. The long-range mutation matrix. Mi, is defined as 

j I- fii iix ^y, 

e\ ^V) ~ S otherwise. ^ ' 

I 2^-1 

In the real world, only a certain kind of mutations are possible, and in this case M( becomes a sparse 
matrix Mi. We introduce a sparseness index s which is the average number of nonzero off-diagonal 
elements of Mg. The sum of these off-diagonal elements still gives /i^. In this case Mg is a quenched 
sparse matrix, and Mi can be considered the average of the annealed version. 

Both Ms and Mi are Markov matrices. Moreover, they are circular matrices, since the value of a 
given element does not depend on its absolute position but only on the distance from the diagonal. This 
means that their spectrum is real, and that the largest eigenvalue is Aq = 1. Since the matrices are 
irreducible, the corresponding eigenvector ^o is non-degenerate, and corresponds to the flat distribution 
Co(a^) = 1/2-^. We discuss in details the properties of these matrices in Appendix A. 

We use the following easier egg representation for quasi-species in the Boolean hyper-cubic space: 
starting from the origin of axis, we perform a step of a fixed length Rq with an angle wk / {2{L — 1)) — 7r/8 
if the nth bit (0 < n < i) of genotype x has value one. In this way one locates the master sequence 
(all zeros) at the origin; the strains with one bad gene, distributed according to the bad gene position at 
distance Rq] the strains with two bad genes at an approximate distance of 2Rq, and so on. An example 
of the resulting hypercube for L = 4 is shown in Figure nl 

2.2 Phenotypic space and selection 

The other necessary ingredient is the selection mechanism, which can either modify the survival probabil- 
ity of individuals or their reproduction efficiency. The selection does not act directly on the genome, but 
rather on the phenotype (how an individual appears to others). The phenotype u(x) of a given genotype 
X can be thought of as an array of morphological characteristics, and generally lives in a much simpler 
space than the genotype. We consider the simplest case, for which u is just a real variable. Moreover 
we consider m as a single- valued function of the genotype u = u{x), which is not the general case, since 
polymorphism or age dependence are usually present. 



In general, the selection mechanism is represented by the concept of the fitness function A |^, ^, 
B3, which depends on the phenotype of the individual and on those present in the environment. The 
fitness function can be defined as the average variation rate of the number of individuals that share that 
phenotype in a given environment, if at least one individual were present. This implies that one can speak 
of the fitness of phenotypes that never appeared in the environment. The fitness function summarizes 
the effects of reproduction and death rate, etc. The more apt an individual, the larger its fitness. 

If direct interactions are not present, then the fitness does not depend on the presence of other 
individuals, and it is a function of the individual phenotype only. This leads to the concept of fitness 
lands ca pe^ a nalogous to the potential surface for noninteracting particles. This analogy will be stressed in 



Section 3T. Evolution thus can be considered an adaptive walk in the fitness landscape [pSj , |34| looking for 
the fitness optimum where displacements in this space are due to mutations. In presence of a vanishing 
mutation rate and an infinite population, the system evolves until it reaches a local maximum of the 
fitness. The effects of muta tions and finite populations (genetic drift) can lead to the escape from a local 
maximum (see Section |3^ ). 

In the presence of a fiat part of the landscape {neutral evolution P5||), there is no preferred genotype, 
then the evolutive path is a random walk through the genotypic space |3^, 37 . 

As already noticed, the genotypic space is a high dimensional hypercube. Analytical results in such 
a space are very difficult to obtain. The phenotypic space is much simpler, and one is tempted to study 
evolution in such a space. For instance, assume that the phenotype u{x) is simply given by the sum of bits 
in X. The phenotypic space is thus degenerate (there are several genotypes with the same phenotype) and 
since point mutations change the phenotype by one unit, it is possible to write the mean-field evolution 
in phenotypic space by inserting the appropriate degeneration factors. For a generic fitness function one 
can study the evolution in phenotypic space only in the limit of a very fiat landscape and a vanishing 
mutation rate, assuming that mutations are able to connects all the genotypes, (for instance, using 
long-range mutations). 

We use the following form for the phenotype u 

n ^ -^ i-i 

u{x) = j^^'^i + Y~7\ X! ^»'^»+i + ^"nix), (3) 

where "qix) is a random function of x, uniformly distributed between —1 and 1 {{'n{x)rj{y)) = 5xy)- 
The magnetic term TL represents a non-epistatic interaction among loci, J^ represents a weakly rough 
landscape, while K, modulates a widely rough landscape (it can be thought as an approximation of a 
spin-glass landscape). 

We have not examined all possible combinations of all parameters. We set J = JC = except for the 



study in the almost fiat fitness landscape of Section 3.1 



The exact definition of the fitness function is deferred to the next section, in which a microscopic 



model will be presented. We see in Section |2.4| that in a mean field approach, the reproduction rate can 
be approximated by 

A(u)=exp(/3i/(7.)), (4) 

which has a form reminiscent of a statistical mechanics model. We call H the fitness tout courts while /3 
is a parameter that can be used to modulate the steepness of A. We always use (3=1. 
The fitness H of the strain Xi in the environment x = {x\, . . . , a;^} is defined as 

1 ^ 
H{x,,x) = V{u{x,)) + -Y,W{u{x,),u{xj)). (5) 

i=i 

The fitness is composed by two parts: the static fitness V, and the interaction term, whose kernel 
is the interaction matrix W . The field V represents a fixed or slowly changing environment; the matrix 



W defines the chemistry of the world and is fixed in time. A phenotype u with static fitness V{u) > 
represents individuals that can survive in isolation (say, after an inoculation into an empty substrate), 
while a strain with V{u) < represents predators or parasites that require the presence of some other 
individuals to survive (again, the phenotype can also not be present in the environment). In this paper 
we generally consider a static fitness function V{u), with a single maximum. 

The matrix W mediates the interactions between two strains. For a classification in terms of usual 
ecological interrelations, one has to consider together W{u, v) and W{v, u). One can have four cases: 

W{u,v) < W{v,u) < competition 

W{u, v) > W{v, u) < predation or parasitism of species u on species v 

W{u, v) < W{v, u) > predation or parasitism of species v on species u 

W{u,v) > W{v,u) > cooperation 

The interaction matrix W specifies the sources of food for non autonomous strains, but due to the 
average over the outside world, W can be an arbitrary matrix. Since the individuals with similar pheno- 
types are those sharing the largest quantity of resources, the competition is the stronger the more similar 
their phenotypes are (intra-species competition). This implies that the interaction matrix W has nega- 
tive components on and near the diagonal. In this paper we only study this feature, which we consider 
essential. The interaction matrix W assumes the following form: 



Wiu,v) = -JKi^—^j (6) 

where the parameter J > controls the intensity of the intra-species competition, and R is the phenotypic 
range over which the competition is effective. We introduce a specific form for the competition kernel 



K{{u - v)/R) in Section 4.1 



2.3 A cellular automata model for a simple ecosystem 

We describe here a cellular automaton model that should be considered as the microscopic reference 
model for the following. We actually study only the spatial mean field version of this model. 

Each individual occupies a cell of a lattice in a one dimensional space; the size of the lattice is Q sites. 
This automaton has a large number of states, one for each different genome plus a state (*) representing 
the empty cell. The evolution of the system is given by the application of two rules: the survival step, 
that includes the interactions among individuals, and the reproduction step. 

In the following, a tilde labels quantities after the survival step, and a prime those after the repro- 
duction step. 

2.3.1 Survival 

An individual Xi = Xi{t) 7^ * at time t and site «, « = 1, . . . , Q, has a probability tt of surviving per unit of 
time. It is reasonable to assume this probability to depend only on phenotypic characters. The survival 
probability vr = 7r(_ff) is expressed as a sigma-shaped function of the fitness function A{H) 

T,(H) = ^''^} , = ^ ,,„ =- + - tanh(/3H). (7) 



The survival phase is expressed as: 

(8) 



I * otherwise 
Clearly, empty cells remains empty during survival 



Xi with probability tt I H (u{xi) , x) ) , 



2.3.2 Reproduction 

Each non-empty cell tries to copy its state to all its Q neighboring cells with probability P^. If two or 
more cells try to copy themselves in the same site, the conflict is solved choosing one of them randomly. 
We are interested in two limiting cases: 

(a) Pr = 1/Q: an individual colonizes in average one of all its neighboring cells. 

(b) Pr — I: an individual colonizes all its neighboring cells. 

These rules can be implemented more easily as rules for empty cells. In case (a) each empty cell 
chooses randomly one of the neighboring cells (either empty or non-empty) and copies its state. If the 
newborn state is different from the empty state then mutations apply by reversing the value of one bit 
with probability /i. This implementation is completely equivalent to rule (a) for long-range couplings. 
This case, due to the low value of Pr, is particularly suitable to study the conditions that lead to the 
extinction of the whole population. 

In case (b) each empty cell chooses randomly one of the neighboring non-empty cells and copies its 
state; then mutations apply by reversing the value of one bit with probability fi. In the spatial mean-field 
approximation, the neighborhood extends to the whole population (Q -^ Q); in this limit, rule (b) is 
characterized by a constant population. 

In both cases we can notice that the effective reproduction rate does not only depend on the survival 
probability of the individual, but also on total availability of empty cells. 

2.4 Spatial mean field theory and the selection mutation equation 

One can have an insight of the features of the model by means of a simple spatial mean field analysis. Let 
n{x) = n{x,t) be the number of organisms with genotype x, N = N{t) is the total number of occupied 
cells 

iV = ^n(x), 

X 

m — N/Q is the average fraction of occupied cells and n^ the number of empty sites, with n,, + N = Q. 
Let us consider the two reproduction rules separately. 

2.4.1 Case (a): Variable Populations 

We can express the fitness H (and thus the survival probability tt) in terms of the number of individuals 
n{x) in a given strain or in terms of the probability distribution p{x) = n{x)/N 

H{u{x),n) = V{u{x)) + -Y,W{u{x),u{v))n{y)- 

V 

H{u{x),p) = V{u{x))+'m'^W(u{x),u{y))p{y). (9) 

V 

where p denotes the probability distribution. The average evolution of the system will be governed by 
the following equations: 

h{x) = 7r(u(x),n)n(x), 

n'{x) =n(x) + ^^M(x,y)n(y). (10) 



Using the Markov properties of the mutation matrices M, and summing over x in Eqs. (UCj), we obtain 
an equation for m: 

m = — = — >^ niuix), n)n(x) — ttvk 

Q Q ^-^ ^ ^ ' ' ^ ' 



2 



Q Q 

i.e. 

m' — 7TT.7f(2 — TO7f), (11) 

where 

7f = —^■K{u{x),n)n{x) = ^■n{u{x),p)p{x) 

X X 

is the average survival probabiHty. 

The normahzed evolution equation for p{x) is: 

TT{u{x),p,m)p{x) + {l-rnTf)Y,yM{x,y)T:{u{y),p,m)p{y) 

P (x) = ;::: ^ ■ (12) 

^ ^ ' 7f(2 - mW) ^ ' 

Notice that Eq. (O) is a logistic equation with n as control parameter. The stationary condition, 

(m' — m), is 

27f - 1 

m = — ^2 — ■ (13) 

One observes extinction if tt < 1/2. The decrease of 7f can be induced by a variation of the environment 
(notably V{x)) or by an increase of the mutation rate /x, which broadens the distribution p{x). This 
last effect corresponds to the mutational meltdown phenomenon, for which the population vanishes while 



continuing to exhibit a quasi-species distribution (see Section 3.2). Since the total population m multiplies 
the competition term in Eq. (|l2), one cannot observe coexistence of species due to competition near the 
mutational meltdown transition. From Eq. (O) one could expect a periodic or chaotic behavior of the 
population; however, since tF is always less than one, the asymptotic dynamics of the population m can 
only exhibit fixed points. 

2.4.2 Case (b): Constant Populations 

In the spatial mean-field limit (Q — > Q), the size of the whole populations does not change after survival 
and reproduction. Because all the empty sites are populated by new individuals (except the trivial case 
of all empty sites as initial condition) N ~ Q. In this case we can rewrite the fitness function H as 
function oi p{x) = n{x)/N 

H{u{x),n) = Viuix)) + ^ ^ Wiuix), uiy))n{y) 

H{u{x),p) ^ V{u{x)) + j:yW{u{x),u{y))p{y). 

The evolution equations after the survival (labeled by a tilde as before) and reproduction (labeled by 
a prime) are 

n{x) — Tr(u{x),Ti)n{x), 

n'{x) ^h{x) + -^^M{x,y)h{y). 
y 



The normalized evolution equation for p{x) is 

p'{x) = Tr{u{x),p)p{x) + f - - 1 j '^M{x,y)TT{u{y),p)p{y), (14) 

\ / y 

where the average survival probability tt is defined as 

7f = —^TT{u{x),n)n{x) = ^Tr{u{x),p)p{x) 

X X 

Since the size of whole population is constant, the mutational meltdown effect is not present, and we 
can consider the limit of small survival probability (non-overlapping generations) tt — > 0. In this limit 
■k{u{x),p) ~ A{x,p) and Eq. (n4h becomes 

,, . Y.yM{x,y)A{u{y),p)p{y) 
p {x) = -^ = . (15) 

with 

'A = ^A{u{x),p)p{x) 



Eq. ( |15D is known in the literature ||38| as the mutation selection equation, and defines a reaction- 
diffusion process in the genotypic space. An equivalent form is 

A{u{x),p)Y,yM(x,y)p[y) 
p (x) = ^ . (16) 

obtained from Eq. (na) by a shift of an half time step. 

For a vanishing mutation rate (i.e. only assuming that all phenotypes can be generated), the average 
fitness A of Eq. (nSl) is a monotonically increasing function of time |B3, M . 



3 Evolution on a static fitness landscape 

3.1 Path integral formulation of evolving ecosystems 

Let us consider the case in which the fitness A does not depend explicitly on the whole distribution p, 
i.e. no competition. Eq. (nS) can be linearized by using unnormalized variables z{x^t) that satisfy 

z{x, i + 1) = ^ A{u{y))M{x, y)z{y, t), (17) 

V 

with the correspondence 

pix,t)- ^^^'*^ 



y 
In vectorial terms, Eq. (|l^) can be written as 

z{t + l)^MAz{t), (18) 

where M^y = M{x,y) and A^ = A{u{x))6xy 
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When one takes into consideration only point mutations {M = Mg), Eq. ( |17| ) can be read as the 
transfer matrix of a two-dimensional Ising model pU B3, M , for which the gcnotypic element (t| corre- 
sponds to the spin in row t and column i, and z(cr, t) is the restricted partition function of row t. The 
effective Hamiltonian (up to constant terms) of a possible genealogical story {cr'} from time 1 < t < T is 



T-l 



^=E 7E'^'W^'(^ + l) + ^("(^)W) ' (19) 



t=l \ i=l 



where 7 = - ln(^s/(l - fJ-s))- 

This peculiar two-dimensional Ising model has a long-range coupling along the row (depending on 
the choice of the fitness function) and a ferromagnetic coupling along the time direction (for small short 
range mutation probability) . In order to obtain the statistical properties of the system one has to sum 
over all possible configurations (stories), eventually selecting the right boundary conditions at time t = 1. 

The bulk properties of Eq. ( |l9| ) cannot be reduced in general to the equilibrium distribution of a 
one dimensional system, since the transition probabilities among rows do not obey detailed balance. 
Moreover, the temperature-dependent Hamiltonian (09) does not allow an easy identification between 
energy and selection, and temperature and mutation, what is naively expected by the biological analogy 
with an adaptive walk. 

An Ising configuration of Eq. (M) corresponds to a possible genealogic story, i.e. as a directed polymer 
in the genotypic space um , where mutations play the role of elasticity. It is natural to try to rewrite the 
model in terms of the sum over all possible paths in genotypic space. 

3.1.1 Long-range mutations 

. Let us first consider the long-range mutation case. 

Eq. ([l8|), reformulated according to Eq. ([l6|), corresponds to 

z{t + T)^iAMiyzit). 

Since it is easier to consider the effects of A and Mg separately, let us study in which limit they commute. 
The norm of the commutator on the asymptotic probability distribution p is 

||[^M,]||=^|[AM,],,p^.|, 

and it is bounded by iigc, where c = maxy [An — Ajj \ . In the limit iiic — > (i.e. a very smooth landscape), 
to first order in c we have 

{AMiY = A'^M] + 0{t^^hc)A''-^MI-^, 

which is the analogous of the Trotter product formula. 

When T is order l//^£, AfJ is a constant matrix with elements equal to 1/2^, and thus M^p is a 
constant probability distribution. If ^i is large enough, {AMiY = A'^ MJ. 

The asymptotic probability distribution p is thus proportional to the diagonal oi A '^^: 

^(,)^Cexp(^^) (20) 

i.e. a Boltzmann distribution with Hamiltonian V{u{x)) and temperature /if. This corresponds to the 
naive analogy between evolution and equilibrium statistical mechanics. In other words, the genotypic 
distribution is equally populated if the phenotype is the same, regardless of the genetic distance since we 
used long-range mutations. 
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In terms of the sum over paths M, H , Eq. ( J17| ) becomes 

z{cr,T)= V ... Vexpf V-7di(^'(t + l),rT'(t))-T/('T'(i)UK,0) 



>; • 


/T-l 

. . ^ exp Y. 

<T'(0) \t=0 


cr'(T-l) 



where ctq = cr'(O) and cr = S'{T). In terms of directed polymers jdLicr'it + l),cr'(t)) is the bending 
contribution to the energy, while in term of path sums jd]^{(T'{t + 1), cr'(t)) is the kinetic energy (and 7 
is the mass of the particle) . 

The relevant paths in the mean field spirit are those that insist on a given genotype for a time order 
l/fj-e- Restricting the sum to the paths formed by segments of time length I /fie, the kinetic energy 
contributes a constant term. After rescaling the time of a factor t = I/m^, we have a free sum of the type 

z(<T,nr)= J2 exp(|]TV^(<T'W))^K,0) 

[(T'{n)] \i=0 J 

which gives the Boltzmann probability distribution (EO). 

We have checked numerically this mapping using a flat landscape V(u) = u and the complete phe- 
notype (H). We iterated Eq. (O) for a given genome length L and for a time t large enough to be sure 
of having reached the asymptotic state. We plotted the logarithm of the probability distribution ■p{x) 
versus the value of the Hamiltonian V{u{x)). We computed the slope l//ie and the average of quadratic 
differences x^ of the linear regression. 

The results are shown in Figure 0. We see that the equilibrium hypothesis is well verified in the limit 
fig 3> c; and that convergence is faster for a rough landscape. 

Since in reality the long-range mutations follow preferred patterns, we checked that the inclusion of 
a sparseness factor s (the number of off-diagonal elements different from zero) do not alter these results. 
In Figure ^ we show that the average of quadratic differences x^ keeps low even for very small values 
of the sparseness s, and that the transition point vanishes when increasing L. The slope of the line is 
almost constant for all values of the sparseness factor s. This implies that for large enough genomes the 
sparseness of the long range mutation matrix does not alter the statistical mechanics interpretation of 
selection and mutations. 

3.1.2 Short-range mutations 

The above results hold qualitatively also for short-range mutations as shown in Figure |j. A small 
dispersion of points in Figure implies that short-range mutations are sufficiently strong to cancel out the 
dependence on the genetic vicinity of strains with the same phenotype to strains with different fitness. 
This does not happen for the very rough landscape case, even though the linear relation is satisfied in 
average. 

In order to obtain a quantitatively correct prediction, one has to consider that the resulting slope fig 
is related to the second largest eigenvalue Ai of the mutation matrix by /i = 1 — Ai. When the phenotype 
only depends on the "magnetic" term Ti, in the asymptotic state the short-range mutations connects 
group of equal fitness. Thus A and Ms commutes and indeed, in the limit fig -^ 0, fie tends towards the 
expected values 2fis/L of Eq. (p^. 

In the opposite case, when the phenotype depends on the disordered term /C, the application of the 
matrix A "rotates" the distribution p in a way which is practically random with respect to the Fourier 
eigenvectors of Ms- Thus, the effective second eigenvalue of the mutation matrix is given by \ — fis, 
obtained averaging over all the eigenvalues of Eq (p4). Consequently, we obtain fi^ — fJ-s, but clearly this 
convergence is quite slow, and is observed only in the limit /i^ ^ c ^ 0. 

When only the J^ is present, one observes an intermediate case, which converges very slowly to the 
disordered case for L -^ 00. 
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The simultaneous application of all three terms or the mixing of long and short-range mutations has 
an addictive effect, at least at first order. 

3.2 A toy model for the error threshold and the mutation meltdown 

Before going in deep studying a general model of an evolving ecosystem that includes the effect of 
competition (co-evolution), let us discuss a simple model Eq] that presents two possible mechanisms of 
escaping from a local optimum, i.e. the error threshold and the MuUer's ratchet. 

We consider a sharp peak landscape: the phenotype uq = 0, corresponding to the master sequence 
genotype x = = (0, 0, . . .) has higher fitness Aq — A{0), and all other genotypes have the same, lower, 
fitness A^. Due to the form of the fitness function, the dynamics of the population is fundamentally 
determined by the fittest strains. The effect of global competition for shared resource is considered 
introducing a standard Verhulst factor, similar to Eq. (O). 

Let us indicate with uq — n(0) the number of individuals sharing the master sequence, with ?ii — n(l) 
the number of individuals with phenotype u = 1 (only one bad gene, i.e. a binary string with all zero, 
except a single 1), and with n, all other individuals. We assume also non-overlapping generations. 

During reproduction, individuals with phenotype uq can mutate, contributing to ni, and those with 
phenotype ui can mutate, increasing n*. We disregard the possibility of back mutations from u.^, to ui 
and from ui to uq. This last assumption is equivalent to the limit L — > cx), which is the case for existing 
organisms. We consider only short-range mutation with probability ^s- Due to the assumption of large 
L, the multiplicity factor of mutations from ui to u* (i.e. L — I) is almost the same of that from uq to 
ui (i.e. L). 

We assume a finite (and constant) carrying capacity K of the environment, assuming that the effective 
reproduction rate of a population is proportional to the Verhulst factor 1 — N/K, where N = no + iii + n^ 
is the total number of individuals. 

The evolution equation of the population is 

N\ 

A / 

n[ = [l- — \ {{1~ ^s)A^ni+ ^sAono), (21) 

N\ 
1 - — A,(n, + fisni). 
K / 

The evolution equation for the total population is 

— / N 

N' ^An(i 

V K 

where 

-;-_ Aquq + A^{ni +n^) 

"^^ N 

is the average fitness of the population. 

The steady state of Eq. (|2l|) is given by n' = n. There are three possible fixed points: 



4^' = 


= 


n'l' = 


= 0, 


nl^' = 


= 0, 


/v(i) = 


= 0: 



P^^l "1 - "' (22) 
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- 0, 



,(2) _ 



P2= < 



,(2) 



= 0, 

iV(2) 



(23) 



and 



A^(2) = K 1- = 
A, 



(3) _ ^(3) (1_AO^10_j4, 

'" " Ao-A^ ' 

(3) ^ ^(3). Ms AoiqAo-A^) 



P3= <^ 



l-/is (Ao-A)-" 

,(3) _ Ar(3) Ms -^0-^* 

1 



(24) 



iV(3) 



1 



Ao{l-fis) 

The fixed point Pi corresponds to extinction of the whole population, i.e. to mutational meltdown 
(MM). It is trivially stable if ^o < li but it can acquire stability also if Aq > 1, A^, < 1 and 



1 

^0 



(25) 



The fixed point P2 corresponds to a distribution in which the master sequence has disappeared even 
if it has larger fitness than other phenotypes. This effect is usually called MuUer's ratchet (MR). The 
point P2 is stable for Aq > 1, ^* > 1 and 



^J.s > 



Ao/A^ 



1 



Ao/A^ 



(26) 



The fixed point P3 corresponds to a coexistence of all phenotypes. It is stable in the rest of cases, 
with Aq > 1. The asymptotic distribution, however, can assume two very different shapes. In the quasi- 
species (QS) distribution, the master sequence phenotype is more abundant than other phenotypes; after 
increasing the mutation rate, however, the numeric predominance of the master sequence is lost, an effect 
that can be denoted error threshold (ET). The transition between these two regimes is given by no — ni, 

a - "^^1^*-^ (27) 

Our definition of the error threshold transition needs some remarks: in Eigen's original work ||l3, n3 the 
error threshold is located at the maximum mean Hamming distance, which corresponds to the maximum 
spread of population. In the limit of very large genomes these two definitions agree, since the transition 

See also Refs. 



becomes very sharp |14 
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In Figure ^ we reported the phase diagram of model ( |2l| ) for ^^ > 1 (the population always survives) . 
There are three regions: for a low mutation probability ^s and high selective advantage Aq/A^, of the 
master sequence, the distribution has the quasi-species form (QS); increasing /i^ the distribution under- 
goes the error threshold (ET) effect; finally, for very high mutation probabilities, the master sequence 
disappears and we enter the Muller's ratchet (MR) region [2^, [l6| . 

In Figure |6| we illustrate the phase diagram in the case ^* = 0.5. For a low mutation probability ^s and 
high selective advantage Aq/A^ of the master sequence, again one observes a quasi-species distribution 
(QS), while for sufficiently large /j,s there is the extinction of the whole population due to the mutational 
meltdown (MM) effect. The transition between the QS and MM phases can occur directly, for 



Ao/A^ < 



i-^/i~a: 

A, 



(28) 
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(dotted QS-MM line in Figure): during the transient before extinction the distribution keeps the QS 
form. For 

AoM, > 1-^^^^ (29) 

one has first the error threshold transition (QS-ET line in Figure), and then one observes extinction due 
to the mutational meltdown effect (dashed ET-MM line in Figure). This mutation- induced extinction 
has been investigated numerically in Ref. p^ . 

We finally report in Figure 1^ the phase diagram of the model in the ^* < 1 case, for some values of 
A^,. Notice that for A^ = 1 the mutational meltdown effect coincides with the MuUer's ratchet one. 

Here the mutation probability fis is defined on a per-genome basis. If one considers a fixed mutation 
probability fi per genome element, one has fig — L^, where L is the genome length. Thus, it is possible 
to trigger these phase transitions by increasing the genome length. 

3.3 Evolution in a phenotypic space 

In the previous section we have obtained some information about the stability of a quasi-species dis- 
tribution. In the following we want to study the stability of a distribution formed by more than one 
quasi-species, i.e. the speciation phenomenon. Before doing that we need to know the shape of a quasi- 
species given a static fitness landscape. Some analytical results can be obtained by considering the 
dynamics only in the phenotypic space. 

We assume that the phenotypic index u ranges between — oo and oo in unit steps (the fitness landscape 
provides that only a finite range of the phenotypic space is viable), and that mutations connect phenotypes 
at unit distance; the probability of observing a mutation per unit of time is fi. The mutational matrix 
M{u, v) has the form: 

{/i if|u,w| = l, 

otherwise. 

Let us consider as before the evolution of phenotypic distribution p{u) , that gives the probability of 
observing the phenotype u. As before the whole distribution is denoted by p. 

Since we are interested in studying the speciation transition in this section we consider model (b) 
(Section ^.4.2| ), the results of this analysis can be partly applied to model (a) as discussed at the end of 
this section. 

Considering a phenotypic linear space and the non-overlapping generations limit (tt -^ 0) we get from 
Eq.([5l) 

^ (1 - 2fi)A{u,p)p{u) + fi{A{u + l,p)p{u + 1) + A{u - l,p)p{u - 1) 

In the limit of continuous phenotypic space, u becomes a real number and 

1 / / N . ^ d'^A(u,p)p(u)\ , , 

p'(u)^=\A{u,p)p(u) + ^i V_i^Zi^J, (30) 

with 

p{u)du — \, \ A{u,p)p(u)du — A. (31) 



Eq. ( pOD has the typical form of a nonlinear diffusion-reaction equation. The numerical solution of 
this equation shows that a stable asymptotic distribution exists for almost all initial conditions. 
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The fitness A{u,p) = cxp{H{u,p)) can be written as before, witli 

/oc 
J{u,v)p{v)dv. 
-oo 



Before studying the effect of competition and the speciation transition let us derive the exact form of 
p{u) in case of a smooth and sharp static fitness landscape. 

3.3.1 Evolution near a smooth and sharp mzLximum 

In the presence of a single maximum tire asymptotic distribution is given by one quasi-species centers 
around the global maximum of the static landscape. The effect of a finite mutation rate is simply that 
of broadening the distribution from a delta peak to a bell-shaped curve. 

We are interested in deriving the exact form of the asymptotic distribution near the maximum. We 
take a static fitness A{u) with a smooth, isolated maximum for u = {smooth maximum approximation). 
Let us assume that 

^(u)~v4o(l-au^), (32) 

where Aq — A{0). Substituting exp(u') = Ap in Eq. ( |30| ) we have (neglecting to indicate the phenotype 
u, and using primes to denote differentiation with respect to it): 

— = 1 + ^J,[w +w ), 
and approximating A~^ = Aq (l + au^), we have 

4-il + au^) = l + tiiw'^+w"). (33) 



A possible solution is 

W[U) — T-. 



Substituting into Eq. (p3) we finally get 



A _ 2 + a^- \J'^a\i + o?-y? 
Tq~ 2 ■ ^ ^ 

Since Aj Ai^ is less than one we have chosen the minus sign. In the limit a/z — > (small mutation rate 
and smooth maximum), we have 

1 — -^/a/I (35) 



Ao 



and 



The asymptotic solution is 



''-f. 




1 + au^ ( 


2 
U 


V2Tra{l + aa^) '' \ 


2a' 



(36) 



p(^) = f7^ ,^ ^T^^P ~7r3 ' (37) 



so that J p{u)du = 1. The solution is then a bell-shaped curve, its width a being determined by the 
combined effects of the curvature a of maximum and the mutation rate fx. 
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For completeness, we study here also the case of a sharp maximum, for which A[u) varies considerably 
with u. In this case the growth rate of less fit strains has a large contribution from the mutations of 
fittest strains, while the reverse flow is negligible, thus 

p{u - l)A{u - 1) > p{u)A{u) > p{u + l)A{u + 1) 

neglecting last term, and substituting q{u) ~ A{u)p{u) in Eq. ( pOJ ) we get: 

(38) 
(39) 









Ao 


1 - 2ii for 


u = 




and 


















q{u) = 


(AAiu) - 


-fl(w - 

1 + 2m)^^ 


1) for 


M>0 


Near u 


= 0, combining 


Eq. H), Eq. (|9|)and Eq. ||)), 


we have 










q{u) = 




w- 1). 




In this 


approximation 


the solut 


ion is 

q{u) = 


( A* y 


1 






\\-2iJia) 




and 




yiu)^ 


-- A{v)q{u) 


~ -^(1 + au 
Aq 


2) [^M^ y 

\ Aa ) 


1 



We have checked the validity of these approximations by solving numerically Eq. (^0|) ; the comparisons 
are shown in Figure (|^). We observe that the smooth maximum approximation agrees with the numerics 
for small values of a, when A[u) varies slowly with u, while the sharp maximum approximation agrees 
with the numerical results for large values of a, when small variations of u correspond to large variations 

of^(u). 

3.4 Coexistence on a static fitness landscape 

We study here the conditions for which more than one quasi-species can coexist on a static fitness 
landscape without competition. 

Let us assume that the fitness landscape has several distinct peaks, and that any peak can be ap- 
proximated by a quadratic function near its maximum. For small but finite mutation rates, as shown by 



Eq. (37), the distribution around an isolated maxima is a bell curve, whose width is given by Eq. (pq 



and average fitness by Eq. (35). Let us call thus distribution a quasi-species, and the peak a niche. 

If the niches are separated by a distance greater than a, a superposition of quasi-species ( p7\ ) is a 
solution of Eq. (^q). Let us number the quasi-species with the index k: 

k 

each pk {u) is centered around Uk and has average fitness Ak ■ The condition for the coexistence of two 
quasi-species h and k is Ah = Ak (this condition can be extended to any number of quasi-species). In other 
terms one can say that in a stable environment the fitness of all individuals is the same, independently 
on the species. This is a sort of global competition among all quasi-species, due to the conservation of 
probability (finite carrying capacity). 

17 



Since the average fitness ( |35| ) of a quasi-species depends on tlie height ^o and the curvature a of the 
niche, one can have coexistence of a sharper niche with larger fitness together with a broader niche with 
lower fitness, as shown in Figure M. However, this coexistence depends crucially on the mutation rate 
/i. If /i is too small, the quasi-species occupying the broader niche disappears; if the mutation rate is 
too high the reverse happens. In this case, the difference of fitness establishes the time scale, which can 
be quite long. In presence of a fluctuating environment, these time scales can be long enough that the 
extinction due to global competition is not relevant. A transient coexistence is illustrated in Figure ^. 
One can design a special form of the landscape that allows the coexistence for a finite interval of values 
of yU, but certainly this is not a generic case. 

We show in the following that the existence of a degenerate effective fitness is a generic case in the 
presence of competition. 

4 Evolution on a dynamic landscape: the role of competition 

4.1 Speciation in the phenotypic space 

In this section we introduce a new factor in our model ecosystem: a short-range (in phenotypic space) 
competition among individuals. As usual, we start the study of its consequences by considering the 
evolution in phenotypic space ^, Q . 

We assume that the static fitness V{u), when not flat, is a linear decreasing function of the phenotype 
u except in the vicinity of u = 0, where it has a quadratic maximum: 

nu)-Vo + b(l-^--^) (40) 

so that close to u = one has V{u) ~ Vo ~ bv? /r'^ and for u -^ cx), V{u) ~ Vq + 5(1 — u/r). The master 
sequence is located at u = 0. 

We have checked numerically that the results are qualitatively independent on the exact form of the 
static fitness, providing that it is a smooth decreasing function. We have introduced this particular form 
because it is suitable for analytical computation, but a more classic Gaussian form can be used. 

For the interaction matrix W we have chosen the following kernel K 



R 
The parameter J and a control the intensity and the steepness of the intra-species competition, respec- 



tively. We use a Gaussian (a = 2) kernel, for the motivations illustrated in Section 4.1. 

For illustration, we report in Figure O the numerical solution of Eq. dlS) , showing a possible evolu- 
tionary scenario that leads to the coexistence of three quasi-species. We have chosen the smooth static 
fitness V{u) of Eq. (Hfl) and a Gaussian (a = 2) competition kernel. One can realize that the effective 
fitness H is almost degenerate (here /j, > and the competition effect extends on the neighborhood of 
the maxima), i.e. that the average fitness of all coexisting quasi-species is the same. 

We now derive the conditions for the coexistence of multiple species. We are interested in its asymp- 
totic behavior in the limit /i — > 0, which is the case for actual organisms. Actually, the mutation 
mechanism is needed only to define the genotypic distance and to populate all available niches. Let us 
assume that the asymptotic distribution is formed by C quasi-species. Since /j, ^ they are approximated 
by delta functions pk{u) = Jk5u,ukj fc = 0, . . . ,£ — 1, centered at Uk- The weight of each quasi species is 
Ik, i.e. 

Pk{u)du = jk, y^ 7fc = 1- 

fc=0 

18 



The quasi-species are ordered such as 70 > 71, . . . , > 7£_i. 
The evolution equations for the pk are 



where A{u) = exp{H{u)) and 






C-1 

H(u) = V{u)-J^K 
3=0 



R 



li- 



The stabihty condition of the asymptotic distribution is {A{uk) — A)pk{u) — 0, i.e. either A{yk) = 
A — const (degeneracy of maxima) or pk{u) — (all other points). This supports our assumption of delta 
functions for the pk ■ 

The position Uk and the weight jk of the quasi-species are given by A{uk) = A = const and 
dA{u)/du\^ — 0, or, in terms of the fitness H , by 






where the prime in the last equation denotes differentiation with respect to u. 

Let us compute the phase boundary for coexistence of three species for two kinds of kernels: the 
exponential {a = 1) and the Gaussian [a = 2) one. The diffusion kernel can be derived by a simple 



reaction-diffusion model, see Ref. |47 



We assume that the static fitness V{u) of Eq. (^Oj). Due to the symmetries of the problem, we have 
the master quasi-species at uq = and, symmetrically, two satellite quasi-species at u = ±wi. Neglecting 
the mutual influence of the two marginal quasi-species, and considering that V'{u{)) = K'{uq/R) = 0, 
K'{ui/ R) = —K'{—ui/R), K{0) — J and that the three-species threshold is given by 70 = 1 and 71 = 0, 
we have 

r 

where u = u/R, f = r/R and b = b/J. We introduce the parameter G = f/b = {J/R)/{b/r), that is the 
ratio of two quantities, the first one related to the strength of inter-species interactions {J /R) and the 
second to intra-species ones (b/r). 

In the following we drop the tildes for convenience. Thus 

r — z — Gexp 1 j = — G, 

/ z" 

Gz"-iexp 

V a 

For a — 1 we have the coexistence condition 

ln(G) = r - 1 + G. 
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The only parameters that satisfy these equations are G — 1 and r = 0, i.e. a flat landscape (6 = 0) with 
infinite range interaction {R — oo). Since the coexistence region reduces to a single point, it is suggested 
that a = 1 is a marginal case. Thus for less steep potentials, such as power law decrease, the coexistence 
condition is supposed not to be fulfilled. 

For a — 2 the coexistence condition is given by 

z^ -{G + r)z + 1 = 0, 

Gzexp f-yj = 1- 

One can solve numerically this system and obtain the boundary Gc{r) for the coexistence. In the limit 
r ^ (almost flat static fitness) one has 

Ge(r)~Ge(0)-r (41) 

with Gc(0) = 2.216 . . .. Thus for G > Gc{r) we have coexistence of three or more quasi-species, while for 
G < Gc{r) only the fittest one survives. 

We have solved numerically Eq. (^^ for several different values of the parameter G. We have consid- 
ered a discrete phenotypic space, with TV points, and a simple Euler algorithm. The results, presented in 
Figure |lj, are not strongly affected by the integration step. The error bars are due to the discreteness of 
the changing parameter G. The boundary of the multi-species phase is well approximated by Eq. ([4l|); 
in particular, we have checked that this boundary does not depend on the mutation rate /i, at least for 
fjL < 0.1, which can be considered a very high mutation rate for real organisms. The most important effect 



of /i is the broadening of quasi-species curves, which can eventually merge as described in Section 3.3.1 
This approximate theory to derive the condition of coexistence of multiple quasi-species still holds for 
the hyper-cubic genotypic space. The different structure of genotypic space does not change the results in 
the limit /i -^ 0. Moreover, the threshold between one and multiple quasi-species is defined as the value 
of parameters for which the satellite quasi-species vanish. In this case the multiplicity factor of satellite 
quasi-species does not influence the competition, and thus we believe that the threshold Gc of Eq. (Blh 
still holds in the genotypic hyper-cubic space. 

For rule (a) with a variable population Eq. dl3), the theory still works substituting G with 

Ga=mG = m^, (42) 

b/r 

due to the presence of m factor in Eq. (Bh (for a detailed analysis see Ref . M] ) . 

This result is in a good agreement with numerical simulations, as shown in the following Section. 

4.2 Speciation and mutational meltdown in the hyper-cubic genotypic space 

Let us now study the consequences of evolution in presence of competition in the more complex genotypic 
space. We were not able to obtain analytical results, so we resort to numerical simulations. Some details 
about the computer code we used can be found in Appendix B. 

In the following we refer always to rule (a), that allows us to study both speciation and mutational 
meltdown. Rule (b) has a similar behavior for speciation transition, while, of course, it does not present 
any mutational meltdown transition. 

We considered the same static fitness landscape of Eq. (^, and the simple genotype-phenotype 
mapping J = JC = \n Eq. (||) (non-epistatic interactions among genes). 

We observe, in good agreement with the analytical approximation Eq. (|4l|), that if Ga (Eq. (E2h) is 
larger than the threshold Gc (Eq. (|4l| )), several quasi-species coexist, otherwise only the master sequence 
quasi-species survives. In Figure Oa distribution with multiple quasi-species is shown. 
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We can characterize the speciation transition by means of the entropy S of the asymptotic phenotypic 
distribution p{u) as function of Ga, 

S = -^p{u)\np{u) 

u 

which increases in correspondence of the appearance of multiple quasi-species. 

We locate the transition at a value Ga — 2.25, while analytical approximation predicts Gc(O.l) ~ 
2.116. The entropy, however, is quite sensible to fluctuations of the quasi-species centered at the master 
sequence (which embraces the largest part of distribution), and it was necessary to average over several 
runs in order to obtain a clear curve; for larger values of /i it was impossible to characterize this transition. 
A quantity which is much less sensitive of fluctuations is the average square phenotypic distance from 
the master sequence g{u)^ 

giuY = y^g{ufp{u). 



In Figure Em (left) we characterize the speciation transition by means of g{u)^, and indeed a single run 
was sufficient, for /i — 10~^. For much higher mutation rates {n = 5 10~^) the transition is less clear, as 
shown in Figure Qq (right), but one can see that the transition point is substantially independent of /j,, 
as predicted by the analysis of speciation in the phenotypic space, Eq. ([4l|). 

Another interesting phenomenon is the meltdown transition, for which the spatial mean field theory 
predicts extinction if 7f < 1/2, Eq. (|l3[). In Figure |l6| we report the result of one simulation in which 
the extinction is induced by the increase of the mutation rate fi. One can notice that the transition is 
discontinuous, m jumps to zero from a value of about 0.15, and that the critical value of 7f is larger that 
the predicted one. This discrepancy can be caused by fluctuations, due to the finiteness of population. 

5 Discussion 

We studied some simple models of evolving asexual populations on a smooth landscape. After some 
preliminary descriptions, we introduced a microscopic cellular automaton model, and we obtained its 
spatial mean-field equations. Afterwards, we studied the consequences of evolution in a static fitness 
landscape. First of all we investigated the limit of validity of the naive relationship between energy 
and selection, and temperature and mutations, i.e. between the time evolution of the population and 
equilibrium properties of a statistical mechanics system. We found that indeed a limit exists for which 
this mapping is valid. We then studied the mutational meltdown, Eigen's error threshold and MuUer's 
ratchet phenomena introducing a minimal model. It is shown that the mutational meltdown may overcome 
the error threshold transition, while this does not happen with the Muller's ratchet effect. Finally, we 
derived the shape of a quasi-species and the condition of coexistence of multiple species in a fixed fitness 
landscape. For vanishing mutation rate, the coexistence of many species that share the same environment 
is related to the degeneracy of maxima of the static fitness, that is a rather peculiar condition. In the 
presence of a finite mutation rate, the highest maxima start being populated and more quasi-species 
appear. This speciation mechanism depends crucially on the mutation rate. This is in contrast with the 
biological data, that indicate speciation is very little influenced by the mutation rate. 

In the second part, we applied these results to the study of coexistence in presence of competition, 
obtaining the conditions for robust speciation in asexual populations. Our model includes the competition 
among individuals, and this ingredient is fundamental for the speciation phenomenon in a smooth fitness 
landscape. In fact in presence of a competition term large "enough" we observe the stable coexistence of 
many quasi-species. 

This speciation transition does not depend on the mutation rate provided that this rate is small. We 
are also able to obtain analytical approximations for the onset of both transitions. 
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Appendix A: Properties of mutation matrices 



We discuss here some properties of the mutational matrices introduced in Section 2.1, Those are circular 



Markov matrices on a hyper-cubic space. In analogy with circular matrices in the usual space, one 



can obtain their complete spectrum (which will be used in Section 3.1) using the analogous of Fourier 



transform in a Boolean hyper-cubic space. Let us define the "Boolean scalar product" 0: 

L 

xQy = ^Xiy„ 

i=l 

where the symbol © represent the sum modulus two (XOR) and the multiplication can be substituted by 
an AND (which has the same effect on Boolean quantities). This scalar product is obviously distributive 
with respect to the XOR: 

(cc e y) z = (a: z) ® (y z). 

Given a function f{x) of a Boolean quantity x e {0, 1}^, its "Boolean Fourier transform" is f{k) 

(fee {0,1}^) 

X 

The anti-transformation operation is determined by the definition of the Kronecker delta 

X 

One can easily verify that the Fourier vectors 

are eigenvector of both M^ and Mg , with eigenvalues 

Ao -1, 

2"^ -1 
for the long-range case, and 



\ 1 '^'' (43) 

Afe = 1 - ^f - -J ^ '' 



Ao -1, 

2/i.rf(fc,0) (44) 

Afe =1 , 

for the short-range case, where d{x, y) is the Hamming distance. 

Appendix B: Monte Carlo Algorithm 

We schematically describe here the code we use to simulate the evolution of the population in the genotypic 
space in presence of competition. The source code (in C) is available upon request. 
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The problem of performing a massive simulation of the ecosystem described in this paper relies in 
Eq. (la): we have to compute N'^ terms, where N is the size of populations. Since individuals with the 
same phenotype contribute equally to the competition term W ^ we can rewrite Eq. ^ as 



H{u, n) = V{u) + -TT ^ W{u, v)n{v), 



N 

where we have replaced a sum over the individuals with a a sum over phenotypes. This solves the problem 
if the possible phenotypes are few, as for instance if one has J ^ K, = Qin Eq. (pi) . In this case there are 
L + 1 possible phenotypes, regardless of N. When the number of possible phenotypes is large (maybe 
of the order of the population size, as happens if /C 7^ 0), one has to avoid the summation over those 
phenotypes not present in the environment. This may be done by keeping a list of the actually present 
phenotypes, with the relative multiplicity. A hash function (for instance, a table) is needed to quickly 
access the list. Furthermore, there is the possibility of grouping the phenotypes in bins. In practice, this 
is realized by addressing the list of phenotypes by means of a discrete phenotypic index. 
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Figure 1 : The caster egg representation of the Boolean hypercube for L = 4 
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Figure 2: Numerical check for long-range mutations. In the simulations we set L = 8, /i^ = 0.1 and 
/is = 0. We varied H. (a,b), J (c,d) and K, (e,f), setting all other parameters to zero, (a) H. — 0.01, 
J = 0, /C = 0; (b) W = 0.001, J^ = 0, /C == 0; (c) W = 0, J = 0.01, /C == 0; (d) H = 0, J7 = 0.001, 
/C = 0; (e) Ti: = 0, J = 0, /C = 0.1; (f) H = 0, J' = 0, /C = 0.01. In the figures t indicates the number of 
generations, /i the reciprocal of the slope of linear regression (/ig in the text). 
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Figure 4: Numerical check for short-range mutations. In the simulations we set L = 8, fig ^ and 
lis = 0.1. We varied H (a,b), J' (c,d) and K, (e,f), setting all other parameters to zero, (a) H = 0.1, 
J ^ 0, IC = 0; (h) H ^ 0.01, J = 0, IC = 0; (c) H ^ 0, J ^ 0.01,/C = 0; (d) H = 0, J = 0.001, /C = 0; 
(e) n = 0, J = 0, IC = 0.01; (i) H = 0, J = 0, JC = 0.001. In the figures t indicates the number of 
generations, /i the reciprocal of the slope of linear regression (/ig in the text). 
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Figure 5: Phase diagram for the error threshold and Muher's ratchet transitions (A* > 1). MR refers to 
the Muher's ratchet phase ET to the error threshold distribution and QS to quasi-species distribution. 
The phase boundary between the Muher's ratchet effect and the error threshold distribution (Eq. (p5|)) 
is marked ET-MR; the phase boundary between the error threshold and the quasi-species distribution 
(Eq. (13)) is marked QS-ET. 
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Figure 6: Phase diagram for the mutational meltdown extinction, the error threshold and the quasi- 
species distributions [A^, < 1). MM refers to the mutational meltdown phase, ET to the error threshold 
distribution and QS to quasi-species distribution. The phase boundary between the Mutational meltdown 
effect and the error threshold distribution (Eqs. ( P6[) and (^9|)) is marked ET-MM; the phase boundary 
between the mutational meltdown and the quasi-species distribution (Eqs. ( |2q ) and (28)) is marked 
QS-MM. 
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Figure 7: Phase diagram for the error threshold and mutational meltdown transitions for some values 
of v4*. ET-QS refers to the Error threshold transition, Eq. ( p7|), QS-MM to the mutational meltdown 
extinction without the error threshold transition, Eqs . (pq ) and (|2§| ), ET-MM to the mutational meltdown 
extinction after the error threshold transition, Eqs. (|26[) and (|29|). The line MR-ET marks the MuUer's 
ratchet boundary, Eq. (Eq), which coincides with the mutational meltdown (MM) boundary for A^, — 1. 
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Figure 8: Average fitness A/Aq versus the coefficient a, of the fitness function, Eq. (p^), for some values 
of the mutation rate fi. Legend: numerical solution corresponds to the numerical solution of Eq. (|30|), 
smooth maximum refers to Eq. (pj) and sharp maximum to Eq. 
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Figure 9: Mutation induced speciation. A two peaks static fitness landscape, increasing the mutation 
rate we pass from a single quasi-species population (left, fi = 0.12) to the coexistence of two quasi-species 
(right, ^ = 0.14). 
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Figure 10: Evolution in a two peaks static fitness landscape, after 500 (left) and 5000 (right) time steps. 
For a transient period of time the two species co-exist, but in the asymptotic limit only the fittest one 
survives. 
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Figure 11: Static fitness V, effective fitness H, and asymptotic distribution p numerically computed for 
the following values of parameters: a = 2, /j, ~ 0.01, Vb = 1-0, b = 0.04, J = 0.6, i? = 10, r = 3 and 
N = 100. 
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Figure 12: Three-species coexistence boundary Gc for a ~ 2. The continuous hue represents the 
analytical approximation, Eq. (El|), the circles are obtained from numerical simulations. The error bars 
represent the maximum error (see text for details). 
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Figure 13: Easter egg representation of quasi-species in hyper-cubic space for L ~ 12. The smallest 
points represent placeholder of strains (whose population is less than 0.02), only the larger dots correspond 
to effectively populated quasi-species; the area of the dot is proportional to the square root of population. 
Parameters: ji = 10"^, Fq = 2, 6 = lO'^, E = 5, r = 0.5, J = 0.28, N = 10000, L = 12. 
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Figure 14: The speciation transition characterized by the entropy S' as a function of the control parameter 
G m. Each point is an average over 15 runs. Same parameters as in Figure |3[ varying J. Errors are of 
the order of fluctuations. 



18 
16 
14 
12 

10 

6 
4 

2 




aM 




Figure 15: Independence of the speciation transition by the mutation rate. The transition is characterized 
by the average square phenotypic distance g{uY of phcnotypic distribution p{u), as a function of the 
control parameter G m. Each point is a single run. Same parameters as in Figure Es, varying J with 
/I = 10"^ (left) and /i = 5 10"^ (right). 
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as a function of mutation rate ^. Here J — 0.3, Vb = 0.4, b = 0.35, r = 0.5, R = 5, N = 2000, L = 8. 
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